Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Eur Phys J B ; 85(6)2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-23645997

RESUMEN

Segmentation is a standard method of data analysis to identify change-points dividing a nonstationary time series into homogeneous segments. However, for long-range fractal correlated series, most of the segmentation techniques detect spurious change-points which are simply due to the heterogeneities induced by the correlations and not to real nonstationarities. To avoid this oversegmentation, we present a segmentation algorithm which takes as a reference for homogeneity, instead of a random i.i.d. series, a correlated series modeled by a fractional noise with the same degree of correlations as the series to be segmented. We apply our algorithm to artificial series with long-range correlations and show that it systematically detects only the change-points produced by real nonstationarities and not those created by the correlations of the signal. Further, we apply the method to the sequence of the long arm of human chromosome 21, which is known to have long-range fractal correlations. We obtain only three segments that clearly correspond to the three regions of different G + C composition revealed by means of a multi-scale wavelet plot. Similar results have been obtained when segmenting all human chromosome sequences, showing the existence of previously unknown huge compositional superstructures in the human genome.

2.
Phys Rev E Stat Nonlin Soft Matter Phys ; 83(3 Pt 1): 031908, 2011 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-21517526

RESUMEN

Human DNA shows a complex structure with compositional features at many scales; the isochores--long DNA segments (~105 bp) of relatively homogeneous guanine-cytosine (G + C) content--are the largest well-documented and well-analyzed compositional structures. However, we report here on the existence of a high-level compositional organization of isochores in the human genome. By using a segmentation algorithm incorporating the long-range correlations existing in human DNA, we find that every chromosome is composed of a few huge segments (~ 107 bp) of relatively homogeneous G + C content, which become the largest compositional organization of the genome. Finally, we show evidence of the biological relevance of these superstructures, pointing to a large-scale functional organization of the human genome.


Asunto(s)
ADN/química , Genoma Humano , Algoritmos , Composición de Base , Mapeo Cromosómico , Cromosomas Humanos/ultraestructura , Islas de CpG , Citosina/química , Secuencia Rica en GC , Guanina/química , Humanos , Modelos Estadísticos , Conformación de Ácido Nucleico , Secuencias Repetitivas de Ácidos Nucleicos , Análisis de Secuencia de ADN
3.
Phys Rev E Stat Nonlin Soft Matter Phys ; 79(3 Pt 2): 035102, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19392005

RESUMEN

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

4.
Phys Rev E Stat Nonlin Soft Matter Phys ; 75(3 Pt 1): 032903, 2007 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-17500745

RESUMEN

The scale-free, long-range correlations detected in DNA sequences contrast with characteristic lengths of genomic elements, being particularly incompatible with the isochores (long, homogeneous DNA segments). By computing the local behavior of the scaling exponent alpha of detrended fluctuation analysis (DFA), we discriminate between sequences with and without true scaling, and we find that no single scaling exists in the human genome. Instead, human chromosomes show a common compositional structure with two characteristic scales, the large one corresponding to the isochores and the other to small and medium scale genomic elements.


Asunto(s)
Mapeo Cromosómico/métodos , Análisis Mutacional de ADN/métodos , Código Genético/genética , Genoma Humano/genética , Modelos Genéticos , Análisis de Secuencia de ADN/métodos , Secuencia de Bases , Simulación por Computador , Mapeo Contig , Variación Genética/genética , Humanos , Datos de Secuencia Molecular , Sitios de Carácter Cuantitativo/genética
5.
Gene ; 300(1-2): 97-104, 2002 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-12468091

RESUMEN

We present a coding measure which is based on the statistical properties of the stop codons, and that is able to estimate accurately the variation of coding content along an anonymous sequence. As the stop codons play the same role in all the genomes (with very few exceptions) the measure turns out to be species-independent. We show results both for prokaryotic and for eukaryotic genomes, indicating, first, the accuracy of the measure, and, second, that better prediction is achieved if the measure is applied on homogeneous, isochore-like sequences than if it is applied following the standard moving window approach. Finally, we discuss on some of the possible applications of the measure.


Asunto(s)
Codón de Terminación/genética , Sistemas de Lectura Abierta/genética , Animales , Bacillus subtilis/genética , Composición de Base , Bases de Datos de Ácidos Nucleicos , Drosophila melanogaster/genética , Genoma Humano , Humanos , Isocoras/genética , Especificidad de la Especie , Estadística como Asunto
6.
Gene ; 300(1-2): 105-15, 2002 Oct 30.
Artículo en Inglés | MEDLINE | ID: mdl-12468092

RESUMEN

Here we present a study of statistical correlations among different positions in DNA sequences and their implications by directly using the autocorrelation function. Such an analysis is possible now because of the availability of large sequences or even complete genomes of many organisms. After describing the way in which the autocorrelation function can be applied to DNA-sequence analysis, we show that long-range correlations, implying scale independence, appear in several bacterial genomes as well as in long human chromosome contigs. The source for such correlations in bacteria, which may extend up to 60 kb in Bacillus subtilis, may be related to massive lateral transfer of compositionally biased genes from other genomes. In the human genome, correlations extend for more than five decades and may be related to the evolution of the 'neogenome', a modern evolutionary acquisition composed by GC-rich isochores displaying long-range correlations and scale invariance.


Asunto(s)
ADN/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , ADN Bacteriano/genética , Genoma Bacteriano , Genoma Humano , Humanos , Análisis de Secuencia de ADN/métodos , Estadística como Asunto
8.
Phys Rev Lett ; 87(16): 168105, 2001 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-11690251

RESUMEN

We introduce a segmentation algorithm to probe the temporal organization of heterogeneities in human heartbeat interval time series. We find that the lengths of segments with different local mean heart rates follow a power-law distribution and show that this scale-invariant structure is not a simple consequence of the long-range correlations present in the data. The differences in mean heart rates between consecutive segments display a common functional form, but with different parameters for healthy individuals and for heart-failure patients. These findings suggest that there is relevant physiological information hidden in the heterogeneities of the heartbeat time series.


Asunto(s)
Frecuencia Cardíaca/fisiología , Corazón/fisiología , Algoritmos , Astronautas , Cardiopatías/fisiopatología , Humanos , Método de Montecarlo
9.
Gene ; 276(1-2): 47-56, 2001 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-11591471

RESUMEN

Analytical DNA ultracentrifugation revealed that eukaryotic genomes are mosaics of isochores: long DNA segments (>>300 kb on average) relatively homogeneous in G+C. Important genome features are dependent on this isochore structure, e.g. genes are found predominantly in the GC-richest isochore classes. However, no reliable method is available to rigorously partition the genome sequence into relatively homogeneous regions of different composition, thereby revealing the isochore structure of chromosomes at the sequence level. Homogeneous regions are currently ascertained by plain statistics on moving windows of arbitrary length, or simply by eye on G+C plots. On the contrary, the entropic segmentation method is able to divide a DNA sequence into relatively homogeneous, statistically significant domains. An early version of this algorithm only produced domains having an average length far below the typical isochore size. Here we show that an improved segmentation method, specifically intended to determine the most statistically significant partition of the sequence at each scale, is able to identify the boundaries between long homogeneous genome regions displaying the typical features of isochores. The algorithm precisely locates classes II and III of the human major histocompatibility complex region, two well-characterized isochores at the sequence level, the boundary between them being the first isochore boundary experimentally characterized at the sequence level. The analysis is then extended to a collection of human large contigs. The relatively homogeneous regions we find show many of the features (G+C range, relative proportion of isochore classes, size distribution, and relationship with gene density) of the isochores identified through DNA centrifugation. Isochore chromosome maps, with many potential applications in genomics, are then drawn for all the completely sequenced eukaryotic genomes available.


Asunto(s)
ADN/genética , Células Eucariotas/metabolismo , Genoma , Animales , Composición de Base , Mapeo Cromosómico , ADN de Hongos/genética , ADN de Plantas/genética , Secuencia Rica en GC/genética , Genes/genética , Variación Genética , Genoma Fúngico , Genoma Humano , Genoma de Planta , Humanos , Complejo Mayor de Histocompatibilidad/genética
10.
Phys Rev Lett ; 85(6): 1342-5, 2000 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-10991547

RESUMEN

We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical properties of coding DNA. We find that this method is highly accurate in finding borders between coding and noncoding regions and requires no "prior training" on known data sets. Our results appear to be more accurate than those obtained with moving windows in the discrimination of coding from noncoding DNA.


Asunto(s)
ADN/química , ADN/genética , Entropía , Código Genético , Modelos Teóricos
11.
Bioinformatics ; 15(12): 974-9, 1999 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-10745986

RESUMEN

MOTIVATION: DNA sequences are formed by patches or domains of different nucleotide composition. In a few simple sequences, domains can simply be identified by eye; however, most DNA sequences show a complex compositional heterogeneity (fractal structure), which cannot be properly detected by current methods. Recently, a computationally efficient segmentation method to analyse such nonstationary sequence structures, based on the Jensen-Shannon entropic divergence, has been described. Specific algorithms implementing this method are now needed. RESULTS: Here we describe a heuristic segmentation algorithm for DNA sequences, which was implemented on a Windows program (SEGMENT). The program divides a DNA sequence into compositionally homogeneous domains by iterating a local optimization procedure at a given statistical significance. Once a sequence is partitioned into domains, a global measure of sequence compositional complexity (SCC), accounting for both the sizes and compositional biases of all the domains in the sequence, is derived. SEGMENT computes SCC as a function of the significance level, which provides a multiscale view of sequence complexity.


Asunto(s)
Algoritmos , Análisis de Secuencia de ADN/métodos , Presentación de Datos , Escherichia coli/genética , Cómputos Matemáticos , Modelos Genéticos , Estructura Molecular , Estructura Terciaria de Proteína/genética , Programas Informáticos , Interfaz Usuario-Computador
12.
Genome Res ; 8(9): 916-28, 1998 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-9750191

RESUMEN

The heterogeneity within, and similarities between, yeast chromosomes are studied. For the former, we show by the size distribution of domains, coding density, size distribution of open reading frames, spatial power spectra, and deviation from binomial distribution for C + G% in large moving windows that there is a strong deviation of the yeast sequences from random sequences. For the latter, not only do we graphically illustrate the similarity for the above mentioned statistics, but we also carry out a rigorous analysis of variance (ANOVA) test. The hypothesis that all yeast chromosomes are similar cannot be rejected by this test. We examine the two possible explanations of this interchromosomal uniformity: a common origin, such as genome-wide duplication (polyploidization), and a concerted evolutionary process.


Asunto(s)
Composición de Base , Cromosomas Fúngicos/química , Saccharomyces cerevisiae/genética , Análisis de Varianza , Citosina/análisis , Evolución Molecular , Guanina/análisis , Sistemas de Lectura Abierta , Análisis de Secuencia de ADN
14.
J Theor Biol ; 160(4): 457-70, 1993 Feb 21.
Artículo en Inglés | MEDLINE | ID: mdl-8501918

RESUMEN

A new method to determine entropic profiles in DNA sequences is presented. It is based on the chaos-game representation (CGR) of gene structure, a technique which produces a fractal-like picture of DNA sequences. First, the CGR image was divided into squares 4-m in size (m being the desired resolution), and the point density counted. Second, appropriate intervals were adjusted, and then a histogram of densities was prepared. Third, Shannon's formula was applied to the probability-distribution histogram, thus obtaining a new entropic estimate for DNA sequences, the histogram entropy, a measurement that goes with the level of constraints on the DNA sequence. Lastly, the entropic profile for the sequence was drawn, by considering the entropies at each resolution level, thus providing a way to summarize the complexity of large genomic regions or even entire genomes at different resolution levels. The application of the method to DNA sequences reveals that entropic profiles obtained in this way, as opposed to previously published ones, clearly discriminate between random and natural DNA sequences. Entropic profiles also show a different degree of variability within and between genomes. The results of these analyses are discussed in relation both to the genome compartmentalization in vertebrates and to the differential action of compositional and/or functional constraints on DNA sequences.


Asunto(s)
Simulación por Computador , Teoría del Juego , Teoría de la Información , Análisis de Secuencia de ADN , Termodinámica , Animales , Secuencia de Bases , Humanos , Vertebrados/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...